Statistical Mechanics of Deep Learning

Yasaman Bahri; Jonathan Kadmon; Jeffrey Pennington; Sam S. Schoenholz; Jascha Sohl-Dickstein; Surya Ganguli

doi:10.1146/annurev-conmatphys-031119-050745

Annual Review of Condensed Matter Physics

Volume 11, 2020

Review Article

Free

Statistical Mechanics of Deep Learning

Yasaman Bahri¹, Jonathan Kadmon², Jeffrey Pennington¹, Sam S. Schoenholz¹, Jascha Sohl-Dickstein¹, and Surya Ganguli^1,2
View Affiliations Hide Affiliations

Affiliations: ¹Google Brain, Google Inc., Mountain View, California 94043, USA ²Department of Applied Physics, Stanford University, Stanford, California 94035, USA; email: [email protected]
Vol. 11:501-528 (Volume publication date March 2020) https://doi.org/10.1146/annurev-conmatphys-031119-050745
First published as a Review in Advance on December 09, 2019
Copyright © 2020 by Annual Reviews. All rights reserved

Abstract

The recent striking success of deep neural networks in machine learning raises profound questions about the theoretical principles underlying their success. For example, what can such deep networks compute? How can we train them? How does information propagate through them? Why can they generalize? And how can we teach them to imagine? We review recent work in which methods of physical analysis rooted in statistical mechanics have begun to provide conceptual insights into these questions. These insights yield connections between deep learning and diverse physical and mathematical topics, including random landscapes, spin glasses, jamming, dynamical phase transitions, chaos, Riemannian geometry, random matrix theory, free probability, and nonequilibrium statistical mechanics. Indeed, the fields of statistical mechanics and machine learning have long enjoyed a rich history of strongly coupled interactions, and recent advances at the intersection of statistical mechanics and deep learning suggest these interactions will only deepen going forward.

Keyword(s): chaos, dynamical phase transitions, interacting particle systems, jamming, machine learning, neural networks, nonequilibrium statistical mechanics, random matrix theory, spin glasses

Article metrics loading...

/content/journals/10.1146/annurev-conmatphys-031119-050745

2020-03-10

2024-06-15

Full text loading...

/deliver/fulltext/conmatphys/11/1/annurev-conmatphys-031119-050745.html?itemId=/content/journals/10.1146/annurev-conmatphys-031119-050745&mimeType=html&fmt=ahah

Literature Cited

1.
LeCun Y, Bengio Y, Hinton G 2015. Nature 521:436–44
[Google Scholar]
2.
Krizhevsky A, Sutskever I, Hinton GE 2012. Advances in Neural Information Processing Systems 25 (NIPS 2012) F Bereira, CJC Burges, L Bottou, KQ Weinberger1097–105 Red Hook, NY: Curran Assoc.
[Google Scholar]
3.
Hannun A, Case C, Casper J, Catanzaro B, Diamos G et al. 2014. arXiv:1412.5567
4.
Devlin J, Chang MW, Lee K, Toutanova K 2019. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT)4171–86 Minneapolis, MN: Assoc. Comput. Linguist.
[Google Scholar]
5.
Silver D, Huang A, Maddison CJ, Guez A, Sifre L et al. 2016. Nature 529:484–89
[Google Scholar]
6.
Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ 2014. PNAS 111:238619–24
[Google Scholar]
7.
McIntosh L, Nayebi A, Maheswaranathan N, Ganguli S, Baccus S 2016. See Reference 191, pp 1369–77
8.
Rogers TT, McClelland JL 2004. Semantic Cognition: A Parallel Distributed Processing Approach Cambridge, MA: MIT Press
[Google Scholar]
9.
Saxe AM, McClelland JL, Ganguli S 2019. PNAS 116:2311537–46
[Google Scholar]
10.
Piech C, Bassen J, Huang J, Ganguli S, Sahami M et al. 2015. Advances in Neural Information Processing Systems 28 (NIPS 2015) C Cortes, ND Lawrence, DD Lee505–13 Red Hook, NY: Curran Assoc.
[Google Scholar]
11.
Engel A, den Broeck CV 2001. Statistical Mechanics of Learning Cambridge, UK: Cambridge Univ. Press
[Google Scholar]
12.
Mézard M, Montanari A 2009. Information, Physics, and Computation New York: Oxford Univ. Press
[Google Scholar]
13.
Advani M, Lahiri S, Ganguli S 2013. J. Stat. Mech. Theory Exp. 2013:P03014
[Google Scholar]
14.
Mehta P, Bukov M, Wang CH, Day AGR, Richardson C et al. 2019. Phys. Rep. 810:1–124
[Google Scholar]
15.
Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M et al. 2019. Rev. Mod. Phys. 91:045002
[Google Scholar]
16.
Sohl-Dickstein J, Weiss EA, Maheswaranathan N, Ganguli S 2015. Proc. Mach. Learn. Res. 37:2256–65
[Google Scholar]
17.
van den Oord A, Dieleman S, Zen H, Simonyan K, Vinyals O et al. 2016. arXiv:1609.03499
18.
Nguyen HC, Zecchina R, Berg J 2017. Adv. Phys. 66:197–261
[Google Scholar]
19.
Hornik K, Stinchcombe M, White H 1989. Neural Netw. 2:359–66
[Google Scholar]
20.
Cybenko G 1989. Math. Control Signals Syst. 2:303–14
[Google Scholar]
21.
Bengio Y, Courville A, Vincent P 2013. IEEE Trans. Pattern Anal. Mach. Intel. 35:1798–828
[Google Scholar]
22.
DiCarlo JJ, Cox DD 2007. Trends Cogn. Sci. 11:333–41
[Google Scholar]
23.
Montufar GF, Pascanu R, Cho K, Bengio Y 2014. See Reference 192, pp 2924–32
24.
Delalleau O, Bengio Y 2011. Advances in Neural Information Processing Systems 24 (NIPS 2011) J Shawe-Taylor, RS Zemel, PL Bartlett, F Pereira, KQ Weinberger666–74 Red Hook, NY: Curran Assoc.
[Google Scholar]
25.
Eldan R, Shamir O 2015. Proc. Mach. Learn. Res. 49:907–940
[Google Scholar]
26.
Telgarsky M 2015. Proc. Mach. Learn. Res. 49:1517–39
[Google Scholar]
27.
Martens J, Chattopadhya A, Pitassi T, Zemel R 2013. Advances in Neural Information Processing Systems 26 (NIPS 2013) CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger2877–85 Red Hook, NY: Curran Assoc.
[Google Scholar]
28.
Bianchini M, Scarselli F 2014. IEEE Trans. Neural Netw. Learn. Syst. 25:1553–65
[Google Scholar]
29.
Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S 2016. See Reference 191, pp 3360–68
30.
Sompolinsky H, Crisanti A, Sommers H 1988. Phys. Rev. Lett. 61:259–62
[Google Scholar]
31.
Schoenholz SS, Gilmer J, Ganguli S, Sohl-Dickstein J 2017. Paper presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France. https://openreview.net/forum?id=H1W1UN9gg
32.
Raghu M, Poole B, Kleinberg J, Ganguli S, Dickstein JS 2017. Proc. Mach. Learn. Res. 70:2847–54
[Google Scholar]
33.
Mhaskar H, Liao Q, Poggio T 2016. arXiv:1603.00988
34.
Chung S, Lee DD, Sompolinsky H 2018. Phys. Rev. X 8:031003
[Google Scholar]
35.
Boyd SP, Vandenberghe L 2004. Convex Optimization Cambridge, UK: Cambridge Univ. Press
[Google Scholar]
36.
Bray AJ, Dean DS 2007. Phys. Rev. Lett. 98:150201
[Google Scholar]
37.
Fyodorov YV, Williams I 2007. J. Stat. Phys. 129:1081–116
[Google Scholar]
38.
Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y 2014. See Reference 192, pp 2933–41
39.
Baldi P, Hornik K 1989. Neural Netw. 2:53–58
[Google Scholar]
40.
Kawaguchi K 2016. See Reference 191, pp 586–94
41.
Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y 2015. J. Mach. Learn. Res. 38:192–204
[Google Scholar]
42.
Crisanti A, Sommers HJ 1992. Z. Phys. B Condens. Matter 87:341–54
[Google Scholar]
43.
Crisanti A, Horner H, Sommers HJ 1993. Z. Phys. B Condens. Matter 92:257–71
[Google Scholar]
44.
Auffinger A, Arous GB 2013. Ann. Probab. 41:4214–47
[Google Scholar]
45.
Auffinger A, Arous GB, Černy` J 2013. Commun. Pure Appl. Math. 66:165–201
[Google Scholar]
46.
Baity-Jesi M, Sagun L, Geiger M, Spigler S, Arous GB et al. 2018. Proc. Mach. Learn. Res. 80:314–23
[Google Scholar]
47.
Cugliandolo LF, Kurchan J 1993. Phys. Rev. Lett. 71:173–76
[Google Scholar]
48.
Arous GB, Dembo A, Guionnet A 2006. Probab. Theory Relat. Fields 136:619–60
[Google Scholar]
49.
Spigler S, Geiger M, d'Ascoli S, Sagun L, Biroli G, Wyart M 2018. J. Phys. A Math. Theor. 52:474001
[Google Scholar]
50.
Geiger M, Spigler S, d'Ascoli S, Sagun L, Baity-Jesi M et al. 2019. Phys. Rev. E 100:012115
[Google Scholar]
51.
O'Hern CS, Silbert LE, Liu AJ, Nagel SR 2003. Phys. Rev. E 68:011306
[Google Scholar]
52.
Franz S, Parisi G 2016. J. Phys. A Math. Theor. 49:145001
[Google Scholar]
53.
Sagun L, Bottou L, LeCun Y 2016. Paper presented at 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico. arXiv:1611.07476
54.
Sagun L, Evci U, Guney VU, Dauphin Y, Bottou L 2018. Paper presented at 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada. arXiv:1706.04454
55.
Papyan V 2018. arXiv:1811.07062
56.
Ghorbani B, Krishnan S, Xiao Y 2019. Proceedings of the 36th International Conference on Machine Learning (ICML 2019), Long Beach, CA, June 9–15 K Chaudhuri, R Salakhutdinov2232–41 Princeton, NJ: Int. Mach. Learn. Soc. arXiv:1901.10159
[Google Scholar]
57.
Baldassi C, Borgs C, Chayes JT, Ingrosso A, Lucibello C et al. 2016. PNAS 113:48E7655–62
[Google Scholar]
58.
Baldassi C, Ingrosso A, Lucibello C, Saglietti L, Zecchina R 2015. Phys. Rev. Lett. 115:12128101
[Google Scholar]
59.
Chaudhari P, Choromanska A, Soatto S, LeCun Y, Baldassi C et al. 2017. Paper presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France
60.
Neal RM 1996. Bayesian Learning for Neural Networks New York: Springer Sci. Bus. Med.
[Google Scholar]
61.
Daniely A, Frostig R, Singer Y 2016. See Reference 191, pp 2253–61
62.
Yang G 2019. arXiv:1902.04760
63.
Xiao L, Bahri Y, Sohl-Dickstein J, Schoenholz S, Pennington J 2018. Proc. Mach. Learn. Res. 80:5393–402
[Google Scholar]
64.
Li P, Nguyen PM 2019. Paper presented at 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA
65.
Chen M, Pennington J, Schoenholz S 2018. Proc. Mach. Learn. Res. 80:873–82
[Google Scholar]
66.
Gilboa D, Chang B, Chen M, Yang G, Schoenholz SS et al. 2019. arXiv:1901.08987
67.
Lee J, Bahri Y, Novak R, Schoenholz S, Pennington J, Sohl-Dickstein J 2018. Paper presented at 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada
68.
Yang G, Schoenholz S 2017. See Reference 193, pp 7103–14
69.
Yang G, Pennington J, Rao V, Sohl-Dickstein J, Schoenholz SS 2019. Paper presented at 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA
70.
Pretorius A, van Biljon E, Kroon S, Kamper H 2018. See Reference 194, pp 5717–26
71.
Hayou S, Doucet A, Rousseau J 2018. arXiv:1805.08266
72.
Cubuk ED, Zoph B, Schoenholz SS, Le QV 2017. arXiv:1711.02846
73.
Karakida R, Akaho S, Amari Si 2018. arXiv:1806.01316
74.
Blumenfeld Y, Gilboa D, Soudry D 2019. arXiv:1906.00771
75.
Kawamoto T, Tsubaki M, Obuchi T 2018. See Reference 194, pp 4361–71
76.
Saxe A, McClelland J, Ganguli S 2014. Paper presented at 2nd International Conference on Learning Representations (ICLR 2014), Banff, AB, Canada
77.
Pennington J, Schoenholz S, Ganguli S 2017. See Reference 193, pp 4785–95
78.
Pennington J, Schoenholz SS, Ganguli S 2018. Proc. Mach. Learn. Res. 84:1924–32
[Google Scholar]
79.
Speicher R 1994. Math. Ann. 298:611–28
[Google Scholar]
80.
Voiculescu DV, Dykema KJ, Nica A 1992. Free Random Variables Providence, RI: Am. Math. Soc.
[Google Scholar]
81.
Tarnowski W, Warchoł P, Jastrzebski S, Tabor J, Nowak MA 2018. Proc. Mach. Learn. Res. 89:2221–30
[Google Scholar]
82.
Pennington J, Bahri Y 2017. Proc. Mach. Learn. Res. 70:2798–806
[Google Scholar]
83.
Pennington J, Worah P 2017. See Reference 193, pp 2637–46
84.
Pennington J, Worah P 2018. See Reference 194, pp 5410–19
85.
Liao Z, Couillet R 2018. Proc. Mach. Learn. Res. 80:3072–81
[Google Scholar]
86.
Advani MS, Saxe AM 2017. arXiv:1710.03667
87.
Lampinen AK, Ganguli S 2018. Paper presented at 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada
88.
Martin CH, Mahoney MW 2018. arXiv:1810.01075
89.
Louart C, Liao Z, Couillet R et al. 2018. Ann. Appl. Probab. 28:21190–248
[Google Scholar]
90.
Liao Z, Couillet R 2018. Proc. Mach. Learn. Res. 80:3072–81
[Google Scholar]
91.
Kadmon J, Sompolinsky H 2016. See Reference 191, pp 4781–89
92.
Schoenholz SS, Pennington J, Sohl-Dickstein J 2017. arXiv:1710.06570
93.
Parisi G, Ritort F, Slanina F 1999. J. Phys. A Math. Gen. 26:247
[Google Scholar]
94.
Martin PC, Siggia ED, Rose HA 1973. Phys. Rev. A 8:423
[Google Scholar]
95.
Sommers HJ 1987. Phys. Rev. Lett. 58:1268–71
[Google Scholar]
96.
De Dominicis C 1978. Phys. Rev. B Condens. Matter Mater. Phys. 18:4913
[Google Scholar]
97.
Sompolinsky H, Crisanti A, Sommers HJ 1988. Phys. Rev. Lett. 61:259–62
[Google Scholar]
98.
Kadmon J, Sompolinsky H 2015. Phys. Rev. X 5:4041030
[Google Scholar]
99.
Crisanti A, Sompolinksy H 2018. Phys. Rev. E 98:062120
[Google Scholar]
100.
Hertz JA, Roudi Y, Sollich P 2016. J. Phys. A Math. Theor. 50:033001
[Google Scholar]
101.
Schücker J, Goedeke S, Dahmen D, Helias M 2016. arXiv:1605.06758
102.
Janssen HK 1976. Z. Phys. B 23:377–80
[Google Scholar]
103.
Chow CC, Buice MA 2015. J. Math. Neurosci. 5:8
[Google Scholar]
104.
Buice MA, Cowan JD 2007. Phys. Rev. E 75:051919
[Google Scholar]
105.
Buice MA, Chow CC 2013. J. Stat. Mech. 2013:P03003
[Google Scholar]
106.
Martí D, Brunel N, Ostojic S 2018. Phys. Rev. E 97:062314
[Google Scholar]
107.
Stapmanns J, Kühn T, Dahmen D, Luu T, Honerkamp C, Helias M 2018. arXiv:1812.09345
108.
Domany E, Meir R 1991. Models of Neural Networks E Domany, JL van Hammen, K Schulten307–34 Berlin/Heidelberg: Springer-Verlag
[Google Scholar]
109.
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O 2017. Paper presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France. arXiv:1611.03530
110.
Shazeer N, Mirhoseini A, Maziarz K, Davis A Le Q, et al. 2017. Paper presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France. arXiv:1701.06538
111.
Valiant LG 1984. Proceedings of the 16th Annual ACM Symposium on Theory of Computing436–45 New York: Assoc. Comput. Mach.
[Google Scholar]
112.
Vapnik VN 1998. Statistical Learning Theory New York: John Wiley & Sons
[Google Scholar]
113.
Koltchinskii V, Panchenko D 2000. High Dimensional Probability II E Giné, DM Mason, JA Wellner443–57 Boston: Birkhäuser
[Google Scholar]
114.
Bartlett PL, Mendelson S 2002. J. Mach. Learn. Res. 3:463–82
[Google Scholar]
115.
Bousquet O, Elisseeff A 2002. J. Mach. Learn. Res. 2:499–526
[Google Scholar]
116.
McAllester DA 1999. Proceedings of the 12th Annual Conference on Learning Theory, (COLT 1999) DA McAllester164–70 New York: Assoc. Comput. Mach.
[Google Scholar]
117.
Bartlett PL, Mendelson S 2002. J. Mach. Learn. Res. 3:463–82
[Google Scholar]
118.
Neyshabur B, Tomioka R, Srebro N 2015. Proc. Mach. Learn. Res. 40:1376–401
[Google Scholar]
119.
Dziugaite GK, Roy DM 2017. arXiv:1703.11008
120.
Golowich N, Rakhlin A, Shamir O 2018. Proc. Mach. Learn. Res. 75:297–99
[Google Scholar]
121.
Neyshabur B, Bhojanapalli S, Srebro N 2018. Paper presented at 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada
122.
Bartlett PL, Foster DJ, Telgarsky MJ 2017. See Reference 193, pp 6240–49
123.
Arora S, Ge R, Neyshabur B, Zhang Y 2018. Proc. Mach. Learn. Res. 80:254–63
[Google Scholar]
124.
Cortes C, Vapnik V 1995. Mach. Learn. 20:273–97
[Google Scholar]
125.
Belkin M, Ma S, Mandal S 2018. Proc. Mach. Learn. Res. 80:540–48
[Google Scholar]
126.
Gardner E 1988. J. Phys. A Math. Gen. 21:257–70
[Google Scholar]
127.
Seung HS, Sompolinsky H, Tishby N 1992. Phys. Rev. A 45:6056
[Google Scholar]
128.
Advani M, Ganguli S 2016. Phys. Rev. X 6:3031034
[Google Scholar]
129.
Hochreiter S, Schmidhuber J 1997. Neural Comput. 9:1–42
[Google Scholar]
130.
Keskar NS, Mudigere D, Nocedal J, Smelyanskiy M, Tang PTP 2017. Paper presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France
131.
Shwartz-Ziv R, Tishby N 2017. arXiv:1703.00810
132.
Saxe AM, Bansal Y, Dapello J, Advani M, Kolchinsky A et al. 2018. Paper presented at 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada
133.
Hinton G, Van Camp D 1993. Proceedings of the 6th Annual Conference on Computational Learning Theory (COLT 1993) L Pitt5–13 New York: Assoc. Comput. Mach.
[Google Scholar]
134.
Hochreiter S, Schmidhuber J 1994. Advances in Neural Information Processing Systems 31 (NIPS 1994) S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett529–36 Red Hook, NY: Curran Assoc.
[Google Scholar]
135.
Neyshabur B, Tomioka R, Srebro N 2015. Paper presented at 3rd International Conference on Learning Representations (ICLR 2015) Workshop Track, San Diego, CA, Abstr. #1412.6614
136.
Novak R, Bahri Y, Abolafia DA, Pennington J, Sohl-Dickstein J 2018. Paper presented at 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada
137.
Novak R, Xiao L, Lee J, Bahri Y, Yang G et al. 2019. Paper presented at 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA
138.
de G. Matthews AG, Hron J, Rowland M, Turner RE, Ghahramani Z 2018. Paper presented at 6th International Conference on Learning Representations (ICLR 2018), Vancouver, BC, Canada
139.
Williams CK 1997. Advances in Neural Information Processing Systems 10 (NIPS 1997) MI Jordan, MJ Kearns, SA Solla295–301 Red Hook, NY: Curran Assoc.
[Google Scholar]
140.
Rasmussen CE, Williams CKI 2005. Gaussian Processes for Machine Learning Cambridge, MA: MIT Press
[Google Scholar]
141.
Lemm J 1999. arXiv:physics/9912005
142.
Jacot A, Gabriel F, Hongler C 2018. See Reference 194, pp 8571–80
143.
Lee J, Xiao L, Schoenholz SS, Bahri Y, Sohl-Dickstein J, Pennington J 2019. Advances in Neural Information Processing Systems 32 (NIPS 2019) S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett8570–81 Red Hook, NY: Curran Assoc.
[Google Scholar]
144.
Arora S, Du SS, Hu W, Li Z, Salakhutdinov R, Wang R 2019. Advances in Neural Information Processing Systems 32 (NIPS 2019) S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett8139–48 Red Hook, NY: Curran Assoc.
[Google Scholar]
145.
Chizat L, Bach F 2018. See Reference 194, pp 3036–46
146.
Song M, Montanari A, Nguyen P 2018. PNAS 115:33E7665–71
[Google Scholar]
147.
Rotskoff GM, Vanden-Eijnden E 2018. See Reference 194, pp 7146–55
148.
Sirignano J, Spiliopoulos K 2019. Stoch. Process. Appl. In press
[Google Scholar]
149.
Ranzato M, Mnih V, Hinton GE 2010. Advances in Neural Information Processing Systems 23 (NIPS 2010) JD Lafferty, CKI Williams, J Shawe-Taylor, RS Zemel, A Culotta2002–10 Red Hook, NY: Curran Assoc.
[Google Scholar]
150.
Du Y, Mordatch I 2019. arXiv:1903.08689
151.
Menick J, Kalchbrenner N 2019. Paper presented at 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA
152.
Radford A, Metz L, Chintala S 2015. Paper presented at 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA
153.
Zontak M, Irani M 2011. Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, CO, June 20–25 Piscataway, NJ: IEEE https://doi.org/10.1109/CVPR.2011.5995401
[Crossref] [Google Scholar]
154.
MacKay DJ 2003. Information Theory, Inference and Learning Algorithms Cambridge, UK: Cambridge Univ. Press
[Google Scholar]
155.
Zhu JY, Krähenbühl P, Shechtman E, Efros AA 2016. European Conference on Computer Vision (ECCV 2016) B Leibe, J Matas, N Sebe, M Welling597–613 Cham: Springer
[Google Scholar]
156.
Murphy KP 2012. Machine Learning: A Probabilistic Perspective Cambridge, MA: MIT Press
[Google Scholar]
157.
Ackley DH, Hinton GE, Sejnowski TJ 1985. Cogn. Sci. 9:147–69
[Google Scholar]
158.
Freund Y, Haussler D 1992. Advances in Neural Information Processing Systems 5 (NIPS 1992) SJ Hanson, JD Cowan, CL Giles912–19 Red Hook, NY: Curran Assoc.
[Google Scholar]
159.
Hinton GE, Osindero S, Teh YW 2006. Neural Comput. 18:1527–54
[Google Scholar]
160.
Salakhutdinov R, Hinton G 2009. J. Mach. Learn. Res. 5:448–55
[Google Scholar]
161.
Ngiam J, Chen Z, Koh PW, Ng AY 2011. Proceedings of the 28th International Conference on Learning Representations (ICLR 2011) L Getoor, T Scheffer1105–12 Madison, WI: Omnipress
[Google Scholar]
162.
Zhao J, Mathieu M, LeCun Y 2017. Paper presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France
163.
Hinton GE 2002. Neural Comput. 14:1771–800
[Google Scholar]
164.
Tieleman T, Hinton G 2009. Proceedings of the 26th International Conference on Machine Learning (ICML 2009), Montreal, Quebec, Canada, June 14–18 A Danyluk, L Bottou, M Littman1033–40 New York: Assoc. Comput. Mach.
[Google Scholar]
165.
Hyvärinen A 2005. J. Mach. Learn. Res. 6:695–709
[Google Scholar]
166.
Besag J 1975. J. R. Stat. Soc. Ser. D (Statistician) 24:179–95
[Google Scholar]
167.
Sohl-Dickstein J, Battaglino P, DeWeese MR 2011. Proceedings of the 28th International Conference on Machine Learning (ICML 2011), Bellevue, Washington, June 28–July 2 L Getoor, T Scheffer905–12 Madison, WI: Omnipress
[Google Scholar]
168.
Sohl-Dickstein J, Battaglino P, DeWeese MR 2011. Phys. Rev. Lett. 107:220601
[Google Scholar]
169.
LeCun Y, Chopra S, Hadsell R, Ranzato M, Huang FJ 2006. Predicting Structured Data G Bakır, T Hofmann, B Schölkopf, A Smola, B Taskar191–246 Cambridge, MA: MIT Press
[Google Scholar]
170.
Jordan MI 2003. An Introduction to Probabilistic Graphical Models Chapters available at https://people.eecs.berkeley.edu/˜jordan/prelims
[Google Scholar]
171.
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014. See Reference 192, pp 2672–80
172.
Levy D, Hoffman MD, Sohl-Dickstein J 2017. Paper presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France
173.
Dinh L, Krueger D, Bengio Y 2014. arXiv:1410.8516
174.
Dinh L, Sohl-Dickstein J, Bengio S 2016. Paper presented at 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico
175.
Rezende DJ, Mohamed S 2015. Paper presented at 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA
176.
van den Oord A, Kalchbrenner N, Kavukcuoglu K 2016. Proc. Mach. Learn. Res. 48:1747–56
[Google Scholar]
177.
Kingma DP, Welling M 2014. Paper presented at the 2nd International Conference on Learning Representations (ICLR 2014), Banff, AB, Canada
178.
Gregor K, Danihelka I, Mnih A, Blundell C, Wierstra D 2014. Proc. Mach. Learn. Res. 32:21242–50
[Google Scholar]
179.
Rezende DJ, Mohamed S, Wierstra D 2014. Proc. Mach. Learn. Res. 32:21278–86
[Google Scholar]
180.
Ozair S, Bengio Y 2014. arXiv:1410.0630
181.
Crutchfield JP, Mitchell M 1995. PNAS 92:10742–46
[Google Scholar]
182.
Still S, Sivak DA, Bell AJ, Crooks GE 2012. Phys. Rev. Lett. 109:120604
[Google Scholar]
183.
Parrondo JM, Horowitz JM, Sagawa T 2015. Nat. Phys. 11:131–39
[Google Scholar]
184.
Lahiri S, Sohl-Dickstein J, Ganguli S 2016. arXiv:1603.07758
185.
Neal RM 2001. Stat. Comput. 11:125–39
[Google Scholar]
186.
Neal RM 2005. arXiv:math/0511216
187.
Sohl-Dickstein J, Culpepper BJ 2012. arXiv:1205.1925
188.
Goyal A, Ke NR, Ganguli S, Bengio Y 2017. See Reference 193, pp 4392–402
189.
Bordes F, Honari S, Vincent P 2017. Paper presented at 5th International Conference on Learning Representations (ICLR 2017), Toulon, France
190.
Gao P, Ganguli S 2015. Curr. Opin. Neurobiol. 32:148–55
[Google Scholar]
191.
Lee DD, Sugiyama M, Luxburg UV, Guyon I, Garnett Reds 2016. Advances in Neural Information Processing Systems 29 (NIPS 2016) Red Hook, NY: Curran Assoc.
[Google Scholar]
192.
Ghahramani Z, Welling M, Cortes Ceds 2014. Advances in Neural Information Processing Systems 27 (NIPS 2014) Red Hook, NY: Curran Assoc.
[Google Scholar]
193.
Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, et al.eds 2017. Advances in Neural Information Processing Systems 30 (NIPS 2017) Red Hook, NY: Curran Assoc.
[Google Scholar]
194.
Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R Advances in Neural Information Processing Systems 31 (NIPS 2018) Red Hook, NY: Curran Assoc.
[Google Scholar]

/content/journals/10.1146/annurev-conmatphys-031119-050745

Statistical Mechanics of Deep Learning

Annual Review of Condensed Matter Physics 11, 501 (2020); https://doi.org/10.1146/annurev-conmatphys-031119-050745

/content/journals/10.1146/annurev-conmatphys-031119-050745

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Many-Body Localization and Thermalization in Quantum Statistical Mechanics
  
  Rahul Nandkishore, and David A. Huse
  
  Vol. 6 (2015), pp. 15–38
- Search for Majorana Fermions in Superconductors
  
  C.W.J. Beenakker
  
  Vol. 4 (2013), pp. 113–136
- The Mechanics and Statistics of Active Matter
  
  Sriram Ramaswamy
  
  Vol. 1 (2010), pp. 323–345
- Topological Materials: Weyl Semimetals
  
  Binghai Yan, and Claudia Felser
  
  Vol. 8 (2017), pp. 337–354
- Motility-Induced Phase Separation
  
  Michael E. Cates, and Julien Tailleur
  
  Vol. 6 (2015), pp. 219–244
- Correlated Quantum Phenomena in the Strong Spin-Orbit Regime
  
  William Witczak-Krempa, Gang Chen, Yong Baek Kim, and Leon Balents
  
  Vol. 5 (2014), pp. 57–82
- Interface Physics in Complex Oxide Heterostructures
  
  Pavlo Zubko, Stefano Gariglio, Marc Gabay, Philippe Ghosez, and Jean-Marc Triscone
  
  Vol. 2 (2011), pp. 141–165
- Equalities and Inequalities: Irreversibility and the Second Law of Thermodynamics at the Nanoscale
  
  Christopher Jarzynski
  
  Vol. 2 (2011), pp. 329–351
- Superconducting Qubits: Current State of Play
  
  Morten Kjaergaard, Mollie E. Schwartz, Jochen Braumüller, Philip Krantz, Joel I.-J. Wang, Simon Gustavsson, and William D. Oliver
  
  Vol. 11 (2020), pp. 369–395
- Strong Correlations from Hund’s Coupling
  
  Antoine Georges, Luca de' Medici, and Jernej Mravlje
  
  Vol. 4 (2013), pp. 137–178
More Less

Annual Review of Condensed Matter Physics

Volume 11, 2020

Review Article

Free

Statistical Mechanics of Deep Learning

Abstract

Most Read This Month

Most Cited Most Cited RSS feed